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Abstract 

In  recent  years,  small-scale  quantum  information  processors  have  been  realized  in  multiple  physical 
architectures.  These  systems  provide  a  universal  set  of  gates  that  allow  one  to  implement  any  given 
unitary  operation.  The  decomposition  of  a  particular  algorithm  into  a  sequence  of  these  available  gates 
is  not  unique.  Thus,  the  fidelity  of  the  implementation  of  an  algorithm  can  be  increased  by  choosing 
an  optimized  decomposition  into  available  gates.  Here,  we  present  a  method  to  find  such  a 
decomposition,  where  a  small-scale  ion  trap  quantum  information  processor  is  used  as  an  example. 
We  demonstrate  a  numerical  optimization  protocol  that  minimizes  the  number  of  required  multi¬ 
qubit  entangling  gates  by  design.  Furthermore,  we  adapt  the  method  for  state  preparation,  and 
quantum  algorithms  including  in-sequence  measurements. 


1.  Introduction 

Quantum  technologies  open  new  possibilities  that  are  inaccessible  with  current  classical  devices,  ranging  from 
cryptography  [1, 2]  to  efficient  simulation  of  physical  systems  [3-5].  To  utilize  the  full  computational  power  of 
quantum  systems,  one  needs  a  universal  quantum  computer :  a  device  able  to  implement  arbitrary  unitary 
operations,  or  at  least  to  approximate  them  to  arbitrary  accuracy.  However,  in  any  specific  physical  system,  only 
a  certain  set  of  operations  is  readily  available.  Therefore,  it  is  necessary  to  decompose  the  desired  unitary 
operation  as  a  sequence  of  these  experimentally  available  gates.  An  available  set  of  gates  is  known  as  universal  if  it 
is  possible  to  find  such  a  decomposition  for  an  arbitrary  unitary  quantum  operation  acting  on  the  qubit  register. 

A  canonical  universal  set  of  gates  consists  of  two-qubit  CNOT  gates  and  arbitrary  single-qubit  rotations. 
There  exist  deterministic  algorithms  that  provide  near- optimal  decompositions  of  unitaries  in  terms  of  these 
gates  [6].  However,  the  set  of  gates  that  yields  the  highest  fidelities  depends  on  the  particular  experimental 
implementation.  In  particular,  two-qubit  CNOT  gates  may  not  be  the  most  efficient  to  implement. 
Architectures  like  trapped  ions  [7, 8]  or  atom  lattices  [9]  include  in  their  toolboxes  high-fidelity  multi-qubit 
gates  that  act  on  the  entire  qubit  register  (see  section  1.1).  Implementing  two-qubit  gates  in  terms  of  these 
requires  refocusing  [10]  or  decoupling  [7]  techniques,  and  thus  increases  the  overhead.  Therefore  it  is  desirable 
to  find  a  direct  decomposition  of  the  target  unitary  into  the  available  operations.  In  general,  the  number  of 
multi- qubit  gates  needs  to  be  minimized,  since  these  are  more  prone  to  errors  than  local  gates. 

Compiling  unitaries  using  multi- qubit  gates  that  act  on  the  whole  qubit  register  is  more  challenging  than 
using  two-qubit  gates.  Even  if  a  sequence  implements  correctly  a  unitary  for  N  qubits,  it  might  not  work  for 
N  +  1  qubits,  since  additional  ‘spectator5  qubits  will  also  be  affected  by  the  sequence  instead  of  being  left 
unchanged  [11].  Therefore,  one  has  to  define  a  qubit  register  of  interest  where  the  unitary  will  be  compiled,  and 
the  experimental  implementation  of  the  resulting  sequence  has  to  be  limited  to  this  subregister,  as  explained  in 
section  1.1.  Moreover,  the  existing  analytical  methods  for  finding  decompositions  of  unitaries  in  terms  of  two- 
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qubit  gates  (see  for  instance  [12])  do  not  seem  to  apply  to  multi-qubit  gates.  Therefore,  in  this  work  we  employ 
an  approach  based  on  numerical  optimization. 

A  similar  algorithm  for  finding  multi- qubit  gate  decompositions  has  been  studied  in  [1 1],  where  optimal 
control  techniques  are  used  to  find  a  pulse  sequence  for  a  given  target  unitary  operation.  The  procedure 
described  there  starts  with  long  sequences  and  then  removes  pulses,  if  possible.  This  often  results  in  sequences 
with  more  entangling  operations  than  actually  required.  In  this  work  we  present  an  algorithm  designed  to 
produce  decompositions  with  a  minimal  number  of  entangling  gates.  In  addition,  we  introduce  a  deterministic 
algorithm  for  finding  decompositions  of  local  unitaries.  We  also  extend  the  algorithm  to  operations  required  for 
state  preparation  or  measurement,  which  are  particular  cases  of  more  general  operations  known  as 
isometries  [13, 14]. 

The  paper  is  organized  as  follows:  in  section  1 . 1  we  describe  precisely  which  gates  we  will  consider  as  part  of 
our  experimentally  available  toolbox,  and  review  some  architectures  for  quantum  information  processing  to 
which  the  methods  described  in  this  work  can  be  applied.  In  section  2  we  show  an  analytic  algorithm  to  compile 
local  unitaries,  which  can  be  used  to  find  efficient  implementations  of  state  and  process  tomographies.  Finally,  in 
section  3  we  describe  and  analyze  an  algorithm  to  compile  fully  general  unitaries  which  relies  on  numerical 
optimization. 

1.1.  Experimental  toolbox 

Several  quantum  information  processing  experiments  based  on  atomic  and  molecular  systems  have  similar 
toolsets  of  quantum  operations  at  their  disposal.  Often,  it  is  convenient  to  apply  collective  rotations  on  an  entire 
qubit  register.  These  collective  (yet  local)  gates,  combined  with  addressed  operations  (typically  rotations  around 
the  Z  axis)  allow  one  to  implement  arbitrary  local  unitaries,  as  we  show  in  section  2.  Together  with  suitable 
multi- qubit  operations,  arbitrary  quantum  unitaries  can  be  implemented.  In  this  work  we  consider  the 
following  set  of  gates: 

•  Collective  rotations  of  the  whole  qubit  register  about  any  axis  on  the  equator  of  the  Bloch  sphere  C  ( 6 ,  <fi). 

Here  9  is  the  rotation  angle  and  <fi  is  the  phase,  so  that: 

C(0 ,  (f))  =  e~ (S*  cos  (j)+Sy  sin  (f>)/2^  Q) 

where  Sx,y  =  ax,y  +  ■  •  •  +  cr^f  are  the  total  spin  projections  on  the  x  or y  axes,  and  ax,y,z  are  the  respective 
Pauli  operators  corresponding  to  qubit  j.  For  the  sake  of  brevity  we  also  define  rotations  around  the  X  and  Y 
axes  as: 

X(0)  =  C(0,O),  (2) 

Y(0)  =  C(0,tt/ 2).  (3) 


•  Single  qubit  rotations  around  the  Z  axis  Zn(0 ),  where  0  is  the  rotation  angle,  and  n  is  the  qubit  index: 

Zn  (0)  =  e-^n/2,  (4) 

with  azn  being  the  Pauli  Z  operator  applied  to  the  n-th  ion. 

•  Entangling  Molmer- Sorensen  (MS)  gates  [15],  with  arbitrary  rotation  angle  and  phase  MS^  ( 6 ).  Here  0  is  the 
rotation  angle  and  (j)  is  the  phase  of  the  gate,  resulting  in: 

MS^(6>)  =  e~ie  cos  ^+sy sin  /4,  (5) 

where  Sx,y  =  ax,y  +  •  •  •  +  are  the  total  spin  projections  on  the  x  or  y  axes,  as  before.  For  0  =  0  or 

(j)  =  7r/2  we  obtain  gates  that  act  around  the  X  or  Y axes,  which  we  will  denote: 

MSx,y(6>)  =  e~wsiy/4.  (6) 

As  mentioned  before,  it  is  desirable  to  be  able  to  restrict  the  action  of  the  MS  gate  to  a  particular  qubit  subset. 
This  can  be  done  experimentally  by  spectroscopically  decoupling  the  rest  of  the  qubits  from  the  computation 
[7] ,  or  by  addressing  the  MS  gate  only  on  the  relevant  subset  of  the  qubits  [16]. 

This  set  of  gates,  or  equivalent  ones,  are  available  in  several  trapped-ion  experiments  [7, 17, 18].  A  similar 
toolbox  of  operations  is  available  for  architectures  based  on  trapped-ion  hyperfine  qubits.  For  example,  [8] 
describes  high-fidelity  microwave  gates  applied  to  a  single  hyperfine  43Ca+  qubit.  In  a  multi- qubit  system,  these 
gates  would  drive  collective  rotations  like  the  ones  described  above.  In  addition,  [19]  describes  a  Raman-driven 
az  (g)  az  phase  gate  on  two  qubits,  which  applied  to  a  many- qubit  register  would  act  analogously  to  the  MS  gate 
already  described. 
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Recently,  an  implementation  of  high-fidelity  gates  in  a  2D  array  of  neutral  atom  qubits  was  reported  [9].  The 
toolbox  described  there  consists  of  global  microwave -driven  gates  and  single-site  Stark  shifts  on  the  atoms, 
which  are  completely  equivalent  to  the  local  operations  described  before  for  the  trapped- ion  architecture.  A 
multi- qubit  CNOT  gate,  equivalent  to  the  MS  gate,  could  also  be  implemented  by  means  of  long-range  Rydberg 
blockade  interactions  [20]. 

2.  Compilation  of  local  unitaries 

Local  unitaries  can  be  written  as  a  product  of  single-qubit  unitaries.  In  this  section  we  show  a  fully  deterministic 
algorithm  that  produces  decompositions  of  any  local  unitary  as  a  sequence  of  collective  equatorial  rotations  and 
addressed  Z  rotations,  as  described  in  section  1.1.  The  decompositions  presented  here  are  optimal  in  the  number 
of  pulses.  These  techniques  are  particularly  useful  for  the  implementation  of  state  and  process  tomographies,  as 
exemplified  in  figure  1,  since  both  require  only  local  operations  at  the  beginning  and  end  of  the  algorithm. 

Let  us  consider  a  register  of  N  qubits,  and  a  local  unitary  U  =  Ui  0  U2  <g>  <S>  UN  to  be  applied  to  them, 

where  Ui  is  the  action  of  the  unitary  on  the  z-th  qubit.  If  the  same  operation  has  to  be  applied  to  more  than  one 
qubit  (I Uj  =  Uj ),  we  can  replace  both  with  a  single  instance  of  the  operation,  and  then  apply  the  same  addressed 
rotations  on  all  qubits  subject  to  the  same  operation  f/z.  Therefore,  we  only  have  to  consider  the  case  where  every 
Ui  is  unique. 

In  order  to  apply  a  general  local  unitary  to  each  qubit  we  need  to  have  at  least  three  degrees  of  freedom  per 
qubit  [6],  so  the  decomposition  must  have  at  least  3 N  free  parameters.  During  the  sequence  at  least  N  —  1  of  the 
qubits  must  eventually  be  addressed,  since  a  different  unitary  has  to  be  applied  to  each  qubit.  Therefore,  a 
sequence  of  addressed  operations  of  the  form  Zx  (6^),  Z2  ( 02 )>  •  .  •  ZN_  i  (0N_  i)  must  be  included  in  the 
decomposition.  These  provide  N  —  1  parameters,  so  2 N  +  1  more  degrees  of  freedom  are  required.  The  most 
economic  way  to  provide  these  is  by  means  of  collective  gates  C  (#z,  0Z),  which  have  two  degrees  of  freedom  each, 
so  the  shortest  sequence  possible  must  include  at  least  N  global  operations,  for  a  total  of  3  N  —  1  free 
parameters.  One  additional  degree  of  freedom  remains,  so  we  must  add  a  last  gate.  This  can  be  either  an 
addressed  operation  on  qubit  N  or  a  collective  gate.  If  we  add  an  addressed  gate  ZN,  we  obtain  a  sequence  of  the 
form: 


U  —  ZN  CNZN_  i  CN_  i  •  •  •  Z2  C2Zi  Q,  (7) 

where  Q  =  C  (i 9 </>•)  and  Zz  =  Zz  (0t-)  are  collective  and  single-qubit  rotations  respectively,  as  explained  in 
section  1.1.  Such  a  sequence  is  useful  for  compiling  local  unitaries  up  to  arbitrary  phases,  as  explained  in 
appendices  A.3  and  A.4.  The  second  alternative  is  to  add  a  collective  rotation  C^: 

U  =  C;CNZN_iCN_i  •••  Z2C2ZiQ,  (8) 

which  is  the  type  of  sequence  we  consider  in  this  section. 

For  particular  unitaries,  some  of  the  Q  and  Zz  in  equation  (8)  may  actually  be  the  identity,  in  which  case  the 
sequence  is  simpler.  Since  the  decomposition  depends  on  the  ordering  of  the  qubits,  by  reordering  them  a 
simpler  sequence  might  be  obtained.  For  small  numbers  of  qubits,  one  can  compile  the  unitary  for  every  possible 
permutation,  although  this  becomes  inefficient  for  large  numbers  of  qubits.  However,  let  us  remember  that,  for 
the  purposes  of  the  compilation,  the  qubits  are  grouped  together  according  to  which  of  them  experience  the 
same  single-qubit  unitary  t/z.  For  an  application  such  as  state  tomography,  there  are  only  three  possible  unitaries 
to  be  applied  to  each  qubit  in  the  register  (shown  in  figure  1),  since  one  only  wants  to  perform  a  measurement  in 
one  of  three  different  bases.  Therefore,  effectively  we  only  need  to  consider  three  qubits,  in  which  case  trying  out 
all  the  permutations  is  perfectly  feasible. 

We  will  describe  now  how  to  compile  a  generic  local  unitary  U  =  U\  U2Z)  •••  (8)  U^  exactly,  using  a 
decomposition  of  the  form  (8).  Let  us  first  note  that  the  unitaries  in  equation  (8)  act  on  the  N-qubit-Hilbert 
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space,  which  is  the  tensor  product  of  the  single-qubit-Hilbert  spaces.  For  the  sake  of  simplifying  the  notation,  we 
will  now  refer  to  these  unitaries  as  Q,  Zz,  and  will  reuse  the  notations  Q  and  Zz  for  their  action  on  the  single 
qubits,  so  that: 

C{  =  Q  0  Q  0  •••  0  Q, 

Z;  =  1  0  1  0  •••  0Zi  0  •••  0  1,  (9) 


where  1  is  the  2x2  identity  matrix,  and  Zz  appears  at  the  i-th  place  (since  it  only  addresses  the  i-th  qubit). 

In  terms  of  these  single-qubit  unitaries,  factoring  equation  (8)  for  each  qubit  we  obtain  N  equations: 

U\  =  C'nCn  •••  C2Z1C1, 

U2  =  C'N Qv  •  •  •  Z2C2 Q, 

Un  =  C'nCn  •••  C2Q.  (10) 

From  the  last  equation  we  can  determine  CN : 

c'NcN=uNcycp  ■■■  (id 

and  eliminating  this  factor  from  the  remaining  equations  we  obtain: 

1/^17!  =  Cf'ZiQ, 

uyu2  =  cr1c2-iz2c2ch 


uyuN. 


■■  cr'cp 


Qy-jZjv-iQvr-i  •••  C2Q. 


(12) 


We  solve  each  equation  in  (12)  consecutively.  To  solve  the  first  equation  in  (12),  let  us  notice  that  its  left- 
hand  side  is  a  known  unitary,  which  can  be  written  as: 

U^Ui  =  e-iQ^/2,  (13) 

where  cq  is  the  angle  of  the  rotation  and  ux  its  generator.  The  right-hand  side  is  simply  a  rotation  around  Z  and  a 
change  of  basis.  Therefore,  the  rotation  angle  ofZ:  must  be  equal  to  cq,  and  the  change  of  basis  must  be  such 
that: 


ui  =  Q  1azC1.  (14) 

We  show  in  appendix  A.  1  how  to  find  the  generator  and  angle  of  the  collective  rotation  Q. 

Having  determined  Q,  we  can  write  the  second  equation  in  (12)  as: 

qt/v'UjCf’  =  c2]z2c2.  (15) 

As  before,  the  left-hand  side  of  this  equation  is  a  known  unitary,  and  the  right-hand  side  consists  of  a  rotation 
around  Z  and  a  change  of  basis,  so  the  rotation  angle  02  and  generator  of  the  change  of  basis  C2  can  be  found  as 
for  the  previous  equation.  This  procedure  can  be  repeated  until  all  of  the  Q  and  Zk  with  k  <  N  —  1  are 
determined.  The  last  collective  operations  CN  and  can  be  determined  from  equation  (1 1).  For  this  we  need  to 
decompose  an  arbitrary  unitary  into  a  product  of  two  equatorial  rotations;  this  can  be  done  as  explained  in 
appendix  A. 2. 

We  have  shown  so  far  how  to  compile  a  local  unitary  exactly.  However,  in  certain  cases  the  constraints  on  the 
target  unitary  are  weaker,  so  that  it  can  be  implemented  with  a  simpler  sequence.  For  instance,  a  unitary  that  is 
followed  by  global  gates  whose  phase  can  be  freely  adjusted  must  only  be  specified  up  to  a  collective  Z  rotation 
afterwards,  since  this  rotation  can  be  absorbed  into  the  phase.  This  removes  one  free  parameter  from  the 
sequence,  thus  simplifying  its  implementation.  The  details  of  this  procedure  are  presented  in  appendix  A.3. 
Another  case  of  interest  is  when  the  target  unitary  is  specified  up  to  arbitrary  independent  Z  rotations  afterwards, 
for  instance  when  the  unitary  is  followed  by  a  projective  measurement  on  the  Z  basis.  This  is  particularly  useful 
for  tomographic  measurements;  details  are  shown  in  appendix  A.4. 


3.  Compilation  of  general  unitaries 

In  section  2  we  studied  how  to  compile  local  unitaries  in  terms  of  collective  and  addressed  rotations.  However,  a 
universal  quantum  computer  also  requires  entangling  unitaries,  which  must  be  compiled  into  the 
experimentally  available  local  and  entangling  gates.  For  example,  in  figure  2  we  show  a  decomposition  of  a 
Toffoli  gate  into  a  sequence  of  local  and  entangling  gates  applied  consecutively.  In  this  section,  we  present  an 
algorithm  to  find  such  decompositions  for  arbitrary  unitaries. 

We  seek  decompositions  directly  in  terms  of  multi-qubit  entangling  gates,  since  these  are  often  more 
efficient  than  decompositions  in  terms  of  two-qubit  gates.  For  example,  a  Toffoli  gate  can  be  implemented  using 
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Figure  2.  Decomposition  of  a  Toffoli  gate  into  a  pulse  sequence  of  collective  equatorial  rotations,  addressed  Z  rotations  and  entangling 
Molmer-Sorensen  (MS)  gates. 
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Figure  3.  Sequence  with  layers  of  local  and  entangling  gates  applied  consecutively. 

only  3  Molmer-Sorensen  (MS)  gates  [1 1],  while  6  CNOT  gates  are  needed  to  implement  it  [21]),  and  a  Fredkin 
gate  can  be  implemented  using  4  MS  gates  [22],  while  the  least  number  of  two-qubit  gates  required  is  5  [23].  As 
described  in  section  1.1,  many  equivalent  types  of  entangling  gates  are  experimentally  available.  We  will  consider 
MS  gates,  but  the  methods  shown  here  are  applicable  to  any  entangling  gate  that  forms  a  universal  set  together 
with  local  operations. 

3. 1 .  Compilation  in  layers 

In  many  quantum  information  processing  experiments  the  most  costly  operations  in  terms  of  fidelity  are 
entangling  gates.  Therefore,  when  trying  to  compile  a  unitary  we  seek  to  minimize  the  number  of  those.  A 
straightforward  way  to  do  this  is  to  use  pulse  sequences  where  layers  of  local  unitaries  and  entangling  gates  are 
applied  consecutively,  as  shown  in  figure  3. 

Any  unitary  can  be  decomposed  in  terms  of  single-qubit  gates  and  two-qubit  CNOT  gates  [24]: 

U=Lm  CNOTm  •••  Lx  CNOTi  L0,  (16) 

where  denotes  an  arbitrary  local  unitary  on  the  whole  qubit  register  and  CNOT*  denotes  a  gate  between  some 
two  qubits.  A  two -qubit  CNOT  gate  can  be  implemented  in  an  arbitrary  N-  qubit  register  as  a  sequence  of  local 
unitaries  and  MSX  (i r/8)  gates  [11].  Therefore,  the  following  decomposition  is  always  possible: 

U=Lm  MSx(tt/8)  •••  LiMSx(tt/8)Lo.  (17) 

However,  some  of  the  local  unitaries  L*  in  a  decomposition  of  the  form  (17)  may  actually  be  identity,  so  after 
removing  them  the  resulting  sequence  has  the  following  structure: 

U=Lm  MSx(kM7r/8)  L\  MS^(/q7r/8)  T0>  (18) 

where  the  kj  are  integers,  and  the  number  M  of  entangling  gates  is  the  same  or  less  than  in  equation  (17).  It  is 
enough  to  consider  0  <  k  <  7,  since  MSX  (7 r)  is  either  the  identity  for  an  odd  number  of  qubits,  or  a  7r  rotation 
around  X  for  an  even  number  of  qubits. 

We  now  seek  to  further  simplify  sequence  (18).  Every  single-qubit  unitary  L7Z-  on  qubit  i  can  be  written  as  a 
composition  of  rotations  around  two  different  fixed  axes  [6],  which  means  that  we  can  always  choose  an ,  cq2 
and  aft  such  that: 

Ui  =  Xf  ( ai3)Zi  (ai2 )Xi  (an).  (19) 

Any  local  unitary  L  =  n£L  1  U'  can  therefore  be  written  as: 

N 

L=  E[  Xi(aa)Zi(ai2)Xi(an),  (20) 

i=l 
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where  the  product  goes  over  the  N  qubits  in  the  register.  Since  unitaries  acting  on  different  qubits  commute,  we 
can  write  this  as: 

N  N  N 

L  =  n  Xj(aa)  n  2)  n  (21) 

i=  1  i=  1  i=l 

=  X'ZX,  (22) 

where  X  and  Z  denote  arbitrary  products  of  rotations  around  the  X  or  Z  axes  for  all  qubits.  Therefore,  the 
sequence  in  (18)  can  be  written  as: 

U  =  XmZmXmMSx  (kM7r/ 8)*  •  • 

x  x/ZiXi MS, (k7r/S)x'ZoU  (23) 

and  commuting  the  X  rotations  with  the  MS  gates  we  obtain  a  sequence  of  the  form: 

U  =  XMZMXMMSx(/cM7r/8)--' 

x  X2'z2X2MS^(a2)21MS^(J:17r/8)l0'ZoXo.  (24) 

Every  odd  local  unitary  (except  for  the  last  one)  is  a  product  of  Z  rotations  on  all  qubits,  and  the  even  local 
unitaries  can  be  grouped  as  Lz-  =  Xz  ZzXz  .  Moreover,  a  collective  Z  rotation  can  be  extracted  from  each  even 
local  unitary  L*  and  absorbed  into  the  phase  of  the  subsequent  MS  gates  and  collective  operations  to  simplify  the 
implementation  ofL*.  Therefore  the  sequence  can  be  written  as: 

U  =  Lm  MS^m  (/cmtt/8)--- 

x  L2 MS^2  (fc2 7r/8)ZiMS0i  (fci7r/8)L0.  (25) 

We  have  thus  shown  that  any  N-  qubit  unitary  U  can  be  decomposed  into  a  sequence  of  the  form  shown  in 
(25).  These  sequences  always  have  the  same  structure,  which  makes  it  easier  to  identify  patterns  if  one  wants  to 
compile  families  of  unitaries,  i.e.  unitaries  that  depend  on  some  tunable  parameter. 

3.2.  Numerical  optimization 

We  have  described  a  general  form  of  a  sequence  of  local  operations  and  global  entangling  gates  that  implements 
any  desired  target  unitary.  It  remains  to  find  the  actual  sequence  parameters,  that  is,  the  rotation  angles  and 
phases  of  the  gates.  However,  we  do  not  know  a  priori  how  many  entangling  gates  will  be  needed  for  a  given 
unitary.  Therefore  we  suggest  the  following  algorithm: 

1 .  Propose  a  sequence  with  M  =  0  entangling  gates. 

2 .  Search  numerically  for  the  sequence  parameters  that  maximize  the  fidelity  with  the  target  unitary. 

3.  If  the  sequence  has  converged  to  the  desired  unitary  (i.e.  the  fidelity  equals  1),  stop.  Otherwise  increase  Mby  1 
and  go  back  to  step  2. 

When  performing  the  numerical  optimization  in  step  2  there  might  be  a  number  of  local  optima  in  addition 
to  the  true  global  optimum,  making  fully  deterministic  optimization  methods  difficult  to  apply.  We  apply 
therefore  a  repeated  local  search ,  where  an  efficient  deterministic  optimization  method  is  iterated,  each  time 
using  randomly  determined  initial  conditions.  The  initial  conditions  are  chosen  randomly  for  every 
optimization  run,  as  experience  has  shown  us  that  starting  close  to  previously  found  local  minima  does  not  offer 
any  improvement.  The  search  is  finalized  whenever  the  fidelity  with  the  target  unitary  is  above  some  predefined 
threshold,  or  when  a  maximum  number  of  tries  is  exceeded.  An  advantage  of  this  method  is  that,  since  each 
optimization  run  starts  from  random  initial  conditions,  these  are  easy  to  perform  in  parallel. 

The  algorithm  chosen  for  each  numerical  optimization  is  the  quasi-Newton  method  of  Broyden,  Fletcher, 
Goldfarb,  and  Shanno  (BFGS)  [25].  The  function  to  be  maximized  is  the  fidelity  of  the  unitary  resulting  from  the 
pulse  sequence  with  the  target  unitary.  Since  we  are  interested  in  exact  solutions,  we  reject  those  solutions  whose 
fidelity  (normalized  to  the  maximum  value  possible)  is  not  equal  to  1  within  some  tolerance  threshold  (usually 
1  %).  The  gradient  of  the  fidelity  can  be  calculated  analytically  as  a  function  of  the  sequence  parameters,  which 
speeds  up  the  computation  as  compared  to  using  several  evaluations  of  the  fidelity  function. 

A  previously  used  approach  to  this  optimization  problem  was  a  combination  of  local  gradient  descent  and 
simulated  annealing  (SA)  [11],  which  also  helps  to  avoid  local  maxima.  However,  this  method  did  not  make  use  of 
the  analytic  expression  for  the  fidelity  gradient,  which  speeds  up  the  search.  Moreover,  its  performance  depends  on 
the  Topography  of  the  optimization  space  and  requires  manual  tuning  of  the  search  parameters  to  achieve  optimal 
results.  We  have  compared  the  BFGS  and  simulated  annealing  approaches  by  compiling  100  random  unitaries 
uniformly  distributed  in  the  Haar  measure  as  explained  in  [26]  for  different  numbers  of  qubits.  In  our  experience, 
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Figure  4.  Average  time  required  to  find  the  global  optimum  for  100  unitaries  randomly  distributed  in  the  Haar  measure  with  the  BFGS 
and  simulated  annealing  methods,  using  an  Intel©Core  i5-4670s  CPU  550  @3.10  GHz  x  4  (one  processing  thread  per  optimization 
run).  No  data  was  obtained  for  the  simulated  annealing  approach  for  4  qubits  owing  to  the  excessive  time  required. 


we  find  that  the  BFGS  method  scales  better  with  the  number  of  qubits  than  simulated  annealing  (see  figure  4).  The 
median  number  of  search  repetitions  needed  to  find  the  global  optimum  was  1  in  all  the  cases. 

The  exponential  scaling  of  the  optimization  problem  complexity  depends  on  the  number  of  entangling  gates 
required  to  compile  a  given  unitary,  which  is  an  intrinsic  property  of  the  unitary  and  does  not  depend  on  the 
search  algorithm.  It  is  already  known  that  it  is  not  possible  to  efficiently  implement  an  arbitrary  unitary  in  terms 
of  two-qubit  gates  [6];  our  numerical  results  suggest  a  similar  result  for  N-qubit  gates.  In  the  two-qubit  case  the 
compilation  always  succeeded  with  3  entangling  gates,  and  not  less  (using  200  search  repetitions).  This  was  to  be 
expected,  since  for  two  qubits  an  MS  gate  is  equivalent  to  a  CNOT  gate,  and  it  is  known  that  3  CNOT  gates  are 
enough  (and  in  general  necessary)  to  implement  an  arbitrary  two-qubit  unitary  [27, 28].  In  the  three-qubit  case, 
the  optimization  always  succeeded  with  8  entangling  gates,  and  never  with  fewer  (also  using  200  repetitions).  For 
4  qubits,  the  optimization  always  succeeded  for  25  entangling  gates,  and  succeeded  only  4%  of  the  time  with  24 
entangling  gates.  However,  we  did  only  4  optimization  repeats  in  the  four-qubit  case,  owing  to  the  increased 
time  it  takes  for  these  to  converge.  Therefore,  it  might  be  the  case  that  given  enough  optimization  runs,  more 
unitaries  would  have  been  compiled  with  only  24  gates.  We  are  not  aware  of  any  result  in  the  literature 
concerning  the  number  of  N- qubit  global  entangling  gates  required  for  implementing  a  general  N- qubit  unitary 
for  more  than  N  =  2  qubits.  From  our  numerical  results,  we  conjecture  that  any  three-qubit  unitary  can  be 
implemented  using  at  most  8  MS  gates,  and  any  four-qubit  unitary  using  at  most  24  or  25  MS  gates. 

A  particularly  interesting  group  of  unitaries  are  Clifford  gates,  which  find  applications  in  quantum  error 
correction  [29],  randomized  benchmarking  [30],  and  state  distillation  protocols  [6].  To  explore  the  difficulty  of 
compiling  such  gates,  we  have  tested  our  algorithm  with  randomly  generated  Clifford  gates,  as  explained  in  [3 1]. 
We  show  in  figure  5  the  distribution  of  the  optimal  number  of  entangling  gates  required  for  compliging  two-, 
three-  and  four-qubit  unitaries.  Our  results  agree  with  the  literature  [32]  for  the  two-qubit  case,  since  MS  gates  are 
then  equivalent  to  controlled-Z  (or  CNOT)  gates.  For  larger  numbers  of  qubits,  the  performance  of  our  algorithm 
in  terms  of  number  of  multi-qubit  gates  required  is  also  similar  to  that  of  algorithms  based  on  two-qubit  gates  [32]. 


3.3.  Compilation  of  isometries 

A  particular  case  of  interest  is  the  compilation  of  a  unitary  whose  action  only  matters  on  certain  input  states.  This 
happens,  for  instance,  when  one  is  interested  in  state  preparation  starting  from  some  fixed  input  state.  Such 
operations  belong  to  the  more  general  class  of  operations  known  as  isometries  [14, 30].  In  this  case,  the  problem  to  be 
solved  has  less  constraints  than  when  fully  specifying  the  target  unitary,  so  a  simpler  sequence  maybe  found.  In  this 
section  we  focus  on  compiling  a  unitary  that  is  only  specified  in  a  particular  subspace  of  the  input  states,  for  example: 


^target 


U\ i  U\2  :  : 

u2\  u22  free  free 

^3i  ^32  :  ; 

U42  •  • 


(26) 
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Optimal  number  of  MS  gates 

Figure  5.  Optimal  number  of  entangling  gates  needed  to  compile  random  Clifford  operations,  for  sample  sizes  { 1000,  200,  100 }  for 
N  =  {2,  3,  4}  qubits.  Each  Clifford  gate  was  generated  using  10iV8  steps  of  the  random  walk  described  in  [31].  Err  or  bars  correspond 
to  one  standard  deviation. 


Figure  6.  Left:  a  unitary  mapping  (Toffoli  gate)  is  applied,  after  which  qubit  3  is  measured.  Right:  a  pulse  sequence  for  implementing 
the  circuit  on  the  left.  This  implementation  is  simpler  than  in  the  fully  constrained  case  (figure  2)  because  of  the  additional  degrees  of 
freedom  when  compiling. 


where  the  columns  marked  as  ‘free’  are  left  unspecified.  In  this  case,  a  suitable  fidelity  function  for  the  numerical 
optimization  is: 


f(U )  =  |tr(l/|s  IWlbl2, 


(27) 


where  U\s  is  a  rectangular  matrix  with  the  components  of  the  unitary  in  the  restricted  subspace. 

A  more  general  case  is  where  some  of  the  relative  phases  of  the  projections  of  the  unitary  acting  on  different 
subspaces  of  the  whole  Hilbert  space  are  irrelevant.  For  example,  suppose  that  one  wants  to  apply  a  unitary  to 
map  some  observable  onto  an  ancilla  qubit  and  then  measure  the  ancilla,  as  shown  in  figure  6.  Since  the  input 
state  of  qubit  3  is  known  to  be  1 0),  only  the  subspace  ofinput  states  spanned  by  { 1 000),  1 010) ,  1 100) ,  1 1 10) }  is 
relevant.  Moreover,  the  measurement  will  project  the  state  of  the  system  onto  either  the  subspace  spanned  by 
{ 1 000) ,  1 0 1 0) ,  1 1 00) },  or  that  spanned  by  { 1 1 1 1 ) } ,  and  all  phase  coherence  between  these  alternatives  will  be 
lost.  Therefore,  the  compiled  sequence  can  be  sought  such  that  it  matches  the  desired  unitary  in  each  of  the 
subspaces  but  allowing  an  arbitrary  phase  <f>  between  them: 


ITtarget 


ll 

0 

0 

0  :  ' 

0 

0 

0 

0  : 

0 

1 

0 

0  : 

0  free  0  free  0  free  0  free 

0 

0 

1 

0  : 

0 

0 

0 

0  : 

0 

0 

0 

0  : 

<0 

0 

0 

ei(f>  :  , 

(28) 


In  this  case  (figure  6)  it  is  possible  to  find  a  simpler  implementation  than  in  the  fully  constrained  case  (figure  2), 
owing  to  the  additional  degrees  of  freedom  available,  namely  arbitrary  outputs  for  the  |  ^3)  =  1 1 )  input  states 
and  an  arbitrary  relative  phase  between  the  two  possible  measurement  outcomes. 
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Figure  7.  State  fidelity  for  a  Toffoli  gate  applied  on  the  8  canonical  input  states. 


In  the  general  case  considered  here  we  want  to  maximize  the  fidelity  in  each  subspace,  without  regard  to  the 
relative  phases  between  these.  Therefore  we  can  seek  to  maximize  the  function/consisting  of  the  sum  of  the 
fidelity  functions  (27)  corresponding  to  each  subspace: 

/([/)  =  £)|tr(U|s.  Utarget|p|2,  (29) 

j 

where  the  sum  goes  over  all  the  subspaces  with  different  relative  phases,  and  U |s;.  is  a  rectangular  matrix  with  the 
components  of  the  unitary  in  thej-th  subspace. 

3.4.  Compensation  of  systematic  errors 

Owing  to  systematic  errors,  the  operations  experimentally  applied  may  still  be  unitary  but  deviate  from  the 
intended  ones.  An  example  of  this  is  addressing  crosstalk  due  to  laser  light  leaking  onto  adjacent  qubits.  If  it  is 
possible  to  characterize  the  actual  experimental  operations  being  applied,  then  they  can  be  taken  into  account 
for  the  compilation  by  adapting  our  optimization  procedure: 

1 .  Compile  the  target  unitary  in  terms  of  the  ideal  gates. 

2.  Replace  the  ideal  gates  by  the  experimentally  characterized  operations. 

3 .  Add  operations  to  obtain  a  higher  fidelity  with  the  ideal  target  unitary. 

As  an  example  we  show  that  excessive  crosstalk  can  be  corrected  in  an  implementation  of  a  Toffoli  gate. 
Figure  7  depicts  experimental  data  corresponding  to  the  action  of  the  Toffoli  gate  on  the  8  input  basis  states.  It 
can  be  seen  that,  by  adding  just  two  pulses,  the  output  fidelity  for  each  input  state  increased  in  some  cases  by  up 
to  20%.  The  sequence  with  1 1  pulses  is  actually  only  an  approximate  correction  to  the  uncorrected  case.  The 
exact  correction  requires  14  pulses,  and  actually  yields  a  lower  fidelity  than  the  approximate  one,  since  it  requires 
more  pulses  and  each  of  these  has  a  non-zero  error  probability. 


4.  Conclusions  and  outlook 

In  this  work  we  have  shown  methods  to  compile  quantum  unitaries  into  a  sequence  of  collective  rotations, 
addressed  rotations  and  global  entangling  operations.  For  local  unitaries,  we  have  demonstrated  an  analytic 
approach  that  produces  the  shortest  possible  sequences  in  the  general  case,  and  adapted  the  method  to  simplify 
the  resulting  sequences  if  some  constraints  on  the  unitary  are  lifted.  For  arbitrary  unitaries,  we  have  presented  an 
approach  that  produces  sequences  of  layered  local  and  entangling  operations.  This  approach  is  based  on  a 
numerical  optimization  procedure  that  is  faster  than  previously  used  ones,  and  the  sequences  obtained  are  by 
design  optimal  with  respect  to  the  number  of  entangling  gates.  Our  numerical  results  suggest  upper  bounds  on 
the  number  of  N-qubit  gates  required  to  implement  arbitrary  three-  and  four-qubit  unitaries. 

The  results  of  this  paper  show  that  in  many  cases  one  may  obtain  more  efficient  implementations  by 
considering  operations  more  general  than  two -qubit  entangling  gates.  However,  the  exponentially  growing 
complexity  of  decompositions  as  the  number  of  qubits  increases  points  to  the  necessity  of  keeping  the  register 
size  small. 
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Appendix  A.  Compiling  local  unitaries 

A.l.  Finding  basis  changes 

In  this  appendix  we  will  show  how  to  satisfy  equation  ( 1 4).  We  need  to  find  a  rotation  C  around  the  equator  of 
the  Bloch  sphere  such  that: 

u  =  C~1gzC ,  (Al) 

where  u  is  the  generator  of  a  given  known  unitary  U,  and  it  can  always  be  written  as: 

u  =  sin  6  cos  (j)  ax  +  sin  6  sin  <fi  ay  +  cos  6  az,  (A2) 

for  some  angles  0 ,  0. 

In  general  C  is  of  the  form: 

C  =  e-i7C/2,  (A3) 

where  7  is  its  rotation  angle  and  c  its  generator,  which  must  lie  on  the  equator  and  thus  be  a  linear  combination 
of  ax  and  ay.  If  we  propose: 

c  =  sin  ax  —  cos  ay ,  (A4) 

and  replace  in  equation  (Al),  we  find  that  the  angle  of  rotation  must  be: 

7  =0.  (A5) 


A.2.  Writing  a  unitary  as  a  product  of  two  equatorial  rotations 

We  will  show  here  how  to  decompose  an  arbitrary  unitary  as  a  product  of  two  rotations  around  the  equator  of 
the  Bloch  sphere,  namely: 

U=C2C 1.  (A6) 

The  target  unitary  can  be  written  as: 


U  =  cos 


1  —  isin 


x  (sin  6  cos  <fi  ax  +  sin  0  sin  ay  +  cos  0  az ), 


(A7) 


where  /?  is  its  rotation  angle, and  0 ,  determine  its  rotation  axis.  Similarly,  the  equatorial  rotations  can  be 

written  as: 


for  some  rotation  angles  cq  and  phases  (//. 
We  shall  asume  that: 


isin 


(cos0'  ax  +  sin  <//  ay ), 


(A8) 


QL\  —  OL2  — 

(A9) 

=  <t>  +  Ay/2, 

(A10) 

0 2  =  0-  A/2. 

(All) 

Replacing  these  into  (A6)  and  solving  for  a  and  A  we  obtain: 

C°S2(|)  =  ^cos^yj  +  ljsin20, 

(A12) 
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cos  A  = 


cos2(f)~cos(f) 

1  - cos2  (f ) 


(A  13) 


A.  3.  Unitaries  up  to  a  collective  Z  rotation 

Suppose  that  the  unitary  U  we  want  to  implement  is  followed  by  gates  whose  phase  can  be  freely  chosen.  Then  it 
must  only  be  specified  up  to  an  arbitrary  collective  rotation  Zf ,  since  this  phase  can  be  absorbed  in  the  following 
gates.  To  compile  U ,  we  shall  consider  a  decomposition  of  the  form  (7): 

U  =  ZfCN Z/v_ i Qy/_ i  •••  Z2C2Z1C1.  (A14) 

Such  a  decomposition  is  more  convenient  in  this  case  because  the  last  addressed  pulse  ZN  has  been  eliminated  by 
taking  advantage  of  the  additional  degree  of  freedom  provided  by  Z' .  We  can  now  follow  the  same  steps  as  in 
section  2.  The  unitary  CN  is  given  by: 

CN  =  Z'-1  UNCycy  •••  (A15) 

and  eliminating  this  factor  from  the  rest  of  the  equations  we  obtain: 

U^Ui  =  Cf'ZiQ, 

UM1U1  =  C{1CpZ2C2Ch 

Un1Um- i  =  C|  lC2  1  CNi | Z,v_ | C,y_ i  •••  C2C1.  (A16) 

Equation  (A16)  can  be  satisfied  in  exactly  the  same  way  as  explained  in  section  2.  In  order  to  satisfy  equation  (A15) 
we  need  to  find  a  rotation  Z'  such  that  the  generator  of  Qv  lies  on  the  equator.  This  can  be  done  as  follows. 

We  wish  to  find  how  to  satisfy  equation  (A1 5).  For  this  we  need  to  find  a  rotation  Z  around  the  Z  axis  and  a 
rotation  C  around  an  axis  on  the  equator  of  the  Bloch  sphere  such  that,  for  a  given  unitary  U ,  the  following 
equation  holds: 

C  =  ZU.  (A17) 

U  is  in  general  of  the  form: 

U  =  e-iaM/2,  (A  18) 


and  Z  is  of  the  form: 


Z  =  e-^/2.  (A  19) 

We  will  first  find  the  angle  of  rotation  / 3 .  If  we  write  out  (A1 7)  in  terms  of  the  generators  of  U  and  Z  we  have: 


(A20) 


c=Hf)i-isin(f) 

X(coS(l)l-iSin(l)“) 


Since  the  axis  of  rotation  of  C  lies  on  the  equator,  its  generator  must  not  have  any  Z  component,  and  thus: 

0  =  sin^jcos^yj  +  cos^Yjsin^yjt/z,  (A21) 


that  is: 


/ 3  =  —  2arctan^tan^yjwz 
Once  / 3  is  known,  the  unitary  on  the  right-hand  side  of  (A1 7)  is  fully  determined,  and  thus  C  as  well. 


(A22) 


A.4.  Unitaries  up  to  independent  Z  rotations 

Finally,  suppose  that  the  unitary  we  want  to  implement  is  defined  up  to  arbitrary  independent  rotations  for  each 
qubit  around  the  Z  axis.  This  is  useful  if  the  unitary  is  followed  by  a  projective  measurement,  since  any  final 
rotation  around  the  measurement  axis  for  any  qubit  simply  adds  a  phase  and  will  not  change  the  measured 
probabilities. 
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Let  us  again  consider  a  sequence  of  the  form  (7).  The  decomposition  must  now  satisfy,  for  each  qubit: 


Z[  U\  —  Cjsf  ’  •  •  C2Z1 Q, 

z'2u2  =  cN  •  •  •  Z2 C2 Ci, 
zrNuN  =  z^c^  •••  C2Cb 

where  the  Z/  are  arbitrary  rotations  around  the  Z  axis.  As  before,  we  can  set  ZN  =  1  and  find  C^: 


Qv  —  z'N  UNCl  1 C2 


C\j 


Eliminating  CN  from  the  remaining  equations  we  obtain: 


Un1Z'n~1Z(  U\  =  Cf'ZiQ, 
U^Z^'Z^  =  Cf'Cj-'ZjQQ, 


u, 


N 1  Z^r  'Z^_  ]  C/\7-  1  —  C[  1  •••  C^ljZjvf-lQv-l 


Q. 


(A23) 

(A24) 


(A25) 


Each  equation  has  now  an  extra  degree  of  freedom  coming  from  the  angle  of  the  Z'k  rotation.  Let  us  for 
simplicity  consider  the  case  where  the  number  of  qubits  N  is  odd.  If  we  group  equation  (A25)  in  pairs  we  get  two 
degrees  of  freedom  per  pair,  which  can  be  used  to  remove  one  of  the  global  operations.  Therefore  we  will  discard 
every  even-numbered  global  operation  C2k  from  our  decomposition  and  look  for  the  solution  of  the  following 
system  of  equations: 


t/v'Z/'L/,  =  C^ZiQ, 

UNlZ2"U2  =  C1~1Z2Ch 

uyzi'u3  =  cric3-iz3c3c1, 
uN'z"u4  =  Cf  1c3-1z4c  3cb 


^n'^N-2  Un-2  —  Q  1  ■"  f'N—  2  ^N—  2  CiC—  2  Q> 
U^Z^-i Un—  i  =  Q  1  CnL2ZN-iCN-2  ■■■  Q, 


(A26) 


where  Z\[  =  Z'N  ]  Z[.  If  the  number  of  qubits  N  is  even,  then  the  last  equation  is  simply  left  unpaired.  It  is  easy  to 
verify  that  for  each  pair  of  equations  the  right-hand  sides  commute,  and  therefore  we  must  have: 


[UnZ^. j l/at-i,  UyzHkU2k]  =  0,  (A27) 

or  equivalently: 

[Z"*_ ,  U2k _ i  uy,  ZZk  U2k  uy]=0.  (A28) 

In  order  to  solve  equation  (A28)  we  need  to  find  rotations  Z\  —  Z  (/?i),  Z2  =  Z  (/?2)  that  satisfy  a  general 
equation  of  the  form: 

[Zi  Uh  Z2  U2\  0,  (A29) 

for  given  arbitrary  Uh  U2,  whose  generators  are  ux  and  u2  respectively. 

Let  us  define: 

Vi  =  ZiUh  (A30) 

and  let  v,-  be  the  generators  of  the  Vj.  In  order  to  satisfy  (A29),  the  vt-  must  satisfy: 

Ei  =  v2  =  v,  (A31) 

since  if  two  unitaries  commute  their  generators  must  be  the  same.  Our  first  goal  is  to  determine  the  generator  v. 
Let  us  consider  the  unitary: 

Wi  =  Z}/2UiZ}/2.  (A32) 

By  writing  down  Wj  explicitly  in  terms  of  the  generators  of  each  factor,  it  can  be  seen  that  its  generator  wz-  satisfies: 

{wb  [azy  «,■]}=  0.  (A33) 

Since  we  have: 

Vi  =  Zy2WiZr1/2,  (A34) 
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from  equation  (A3 3)  we  see  that: 

{v,  Zj/2[az,  Mj]Z“1/2}  =  0.  (A35) 

The  geometrical  meaning  of  this  equation  is  that  the  vector  defined  by  v  on  the  Bloch  sphere  is  perpendicular  to 
that  defined  by  Z-1//: 2  [az,  uz-]Zf_1//2.  Since  (A3  5)  must  hold  for  i  =  1,  2,  v  must  correspond  to  the  cross  product 
of  these  vectors: 


v  =  X[Ziy2[z,  MjJZf1/2,  z¥2[az,  u2]Zp/2], 

where  AT  is  chosen  such  that: 

— tr(v2)  =  1. 

2 


(A3  6) 

(A3  7) 


Having  found  v,  it  remains  to  find  the  rotation  angles  Now,  v  must  satisfy  [Zj  v]  =  0,  and  therefore: 

UivUr1  =  ZtfiT'vZtfi).  (A38) 

Both  v  and  Uj  are  known,  so  v  and  U{  vUf 1  can  be  written  down  explicitely  as: 

v  =  sin  9  cos  <f>  <7X  +  sin  9  sin  <f>  ay  +  cos  9  az ,  (A39) 

U[vU ffl  =  sin  9  cos  </)■  <jx  +  sin  9  sin  ft.  ay  +  cos  9  <jz ,  (A40) 

and  therefore: 

Pi  =  ^  -  4  (A41) 


We  have  shown  how  to  find  suitable  rotations  Z"  that  fulfill  condition  (A28).  Once  these  are  found,  all  the 
left-hand  sides  of  (A26)  are  known  unitaries  and  the  system  can  be  solved  as  before.  The  last  collective  rotation 
CN  can  be  determined  from  (A24)  as  shown  in  appendix  A. 3.  We  have  thus  shown  how  to  compile  the  sought 
unitary  U  into  a  sequence  of  the  form: 


f  C]\y Zjy_  i Zjy_ 2  Cn— 2  C3Z2Z1Q  for  odd  N, 
\CnZN-iCn-i  •••  C3Z2Z1C1  for  even  N. 


(A42) 
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